168 research outputs found
Guiding InfoGAN with Semi-Supervision
In this paper we propose a new semi-supervised GAN architecture (ss-InfoGAN)
for image synthesis that leverages information from few labels (as little as
0.22%, max. 10% of the dataset) to learn semantically meaningful and
controllable data representations where latent variables correspond to label
categories. The architecture builds on Information Maximizing Generative
Adversarial Networks (InfoGAN) and is shown to learn both continuous and
categorical codes and achieves higher quality of synthetic samples compared to
fully unsupervised settings. Furthermore, we show that using small amounts of
labeled data speeds-up training convergence. The architecture maintains the
ability to disentangle latent variables for which no labels are available.
Finally, we contribute an information-theoretic reasoning on how introducing
semi-supervision increases mutual information between synthetic and real data
STCN: Stochastic Temporal Convolutional Networks
Convolutional architectures have recently been shown to be competitive on
many sequence modelling tasks when compared to the de-facto standard of
recurrent neural networks (RNNs), while providing computational and modeling
advantages due to inherent parallelism. However, currently there remains a
performance gap to more expressive stochastic RNN variants, especially those
with several layers of dependent random variables. In this work, we propose
stochastic temporal convolutional networks (STCNs), a novel architecture that
combines the computational advantages of temporal convolutional networks (TCN)
with the representational power and robustness of stochastic latent spaces. In
particular, we propose a hierarchy of stochastic latent variables that captures
temporal dependencies at different time-scales. The architecture is modular and
flexible due to the decoupling of the deterministic and stochastic layers. We
show that the proposed architecture achieves state of the art log-likelihoods
across several tasks. Finally, the model is capable of predicting high-quality
synthetic samples over a long-range temporal horizon in modeling of handwritten
text
Learning Human Motion Models for Long-term Predictions
We propose a new architecture for the learning of predictive spatio-temporal
motion models from data alone. Our approach, dubbed the Dropout Autoencoder
LSTM, is capable of synthesizing natural looking motion sequences over long
time horizons without catastrophic drift or motion degradation. The model
consists of two components, a 3-layer recurrent neural network to model
temporal aspects and a novel auto-encoder that is trained to implicitly recover
the spatial structure of the human skeleton via randomly removing information
about joints during training time. This Dropout Autoencoder (D-AE) is then used
to filter each predicted pose of the LSTM, reducing accumulation of error and
hence drift over time. Furthermore, we propose new evaluation protocols to
assess the quality of synthetic motion sequences even for which no ground truth
data exists. The proposed protocols can be used to assess generated sequences
of arbitrary length. Finally, we evaluate our proposed method on two of the
largest motion-capture datasets available to date and show that our model
outperforms the state-of-the-art on a variety of actions, including cyclic and
acyclic motion, and that it can produce natural looking sequences over longer
time horizons than previous methods
- …